fix: Like::TestString to align with Java LIKE semantics#320
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes Like::TestString to match Java Paimon's LIKE semantics: strict escape validation, UTF-8 code-point-aware _ wildcard, and Java regex line-terminator handling. Also replaces the alloca-based DP buffer with a std::vector<bool>.
Changes:
- Parse pattern in two phases: validate escapes (only
\_,\%,\\permitted; trailing\and other escapes raiseStatus::Invalid) and merge consecutive%. - Decompose both pattern and field into UTF-8 code points so
_matches one character (not one byte) and rejects Java line terminators (\n,\r,U+0085,U+2028,U+2029);%still matches anything. Min-length quick reject now counts_as a required char. - Replace
alloca/unique_ptr<bool[]>DP storage withstd::vector<bool>; add tests covering invalid escapes, escaped\/%, multibyte_, and line-terminator semantics.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/paimon/common/predicate/like.cpp | Rewrites Like::TestString for Java-compatible escape validation, UTF-8 _ semantics, line-terminator handling, and std::vector-based DP. |
| src/paimon/common/predicate/predicate_test.cpp | Adds four TEST_F cases for invalid escapes, escaped backslash/percent, UTF-8 multibyte _, and line-terminator semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
zjw1111
reviewed
Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
No Linked issue.
Description:
\and invalid escapes as errors_match one UTF-8 character instead of one byte%matching any sequenceallocaand usestd::vector<bool>insteadTests
\and%tests_matching testsAPI and Format
Documentation
Generative AI tooling
Aone Copilot (Claude)